Spaced Seeds Design Using Perfect Rulers

نویسندگان

  • Lavinia Egidi
  • Giovanni Manzini
چکیده

We consider the problem of lossless spaced seed design for approximate pattern matching. We show that, using mathematical objects known as perfect rulers, we can derive a family of spaced seeds for matching with up to two errors. We analyze these seeds with respect to the trade-off they offer between seed weight and the minimum length of the pattern to be matched. We prove that for patterns of length up to a few hundreds our seeds have a larger weight, hence a better filtration efficiency, than the ones known in the literature. In this context, we study in depth the specific case of Wichmann rulers and prove some preliminary results on the generalization of our approach to the larger class of unrestricted rulers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SpEED: fast computation of sensitive spaced seeds

SUMMARY Multiple spaced seeds represent the current state-of-the-art for similarity search in bioinformatics, with applications in various areas such as sequence alignment, read mapping, oligonucleotide design, etc. We present SpEED, a software program that computes highly sensitive multiple spaced seeds. SpEED can be several orders of magnitude faster and computes better seeds than the existin...

متن کامل

Spaced Seeds for Cross-species Cdna-to-genome Sequence Alignment

We review recent developments in spaced seed design for cross-species sequence alignment. We start with a brief overview of original ideas and early techniques, and then focus on more recent work on finding accurate (sensitive and specific) seeds for cross-species cDNA-to-genome alignment. These recent developments include methods and models for estimating seed specificity and determining sensi...

متن کامل

Evolving Golomb Rulers

A Golomb Ruler is defined as a ruler that has marks unevenly spaced at integer locations in such a way that the distance between any two marks is unique. Unlike usual rulers, they have the ability to measure more discrete measures than the number of marks they possess. Although the definition of a Golomb Ruler does not place any restriction on the length of the ruler, researchers are usually in...

متن کامل

Optimal Spaced Seeds for Finding Homologous Coding Regions

We study the problem of computing optimal spaced seeds for identifying homologous coding DNA sequences in large genomic data sets. We develop two models of DNA sequence alignment in coding regions, and using data sets from human/Drosophila and human/mouse comparisons, we compute optimal spaced seeds using a dynamic programming algorithm. The seeds we identify are more sensitive by far at identi...

متن کامل

Indel seeds for homology search

We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Fundam. Inform.

دوره 131  شماره 

صفحات  -

تاریخ انتشار 2011